De Novo Genome Assembly ◾ 105
If the installation is successful, you can run the following to display the help:
busco –help
BUSCO databases include ortholog databases for several clades of organisms. Before using
BUSCO, you may need to identify the database to use for the assessment. The database list
can be displayed using the following command:
busco --list-datasets
Now, we can use BUSCO to assess the three E. coli assemblies (one generated with ABySS
and two generated by SPAdes). We can save the output of each assessment in a separate
directory.
busco \
-i abyss_ecoli_ass.fasta \
-o abyss_ecoli_ass.out \
-l bacteria \
-m genome
busco \
-i spades_ecoli_ass.fasta \
-o spades_ecoli_ass.out \
-l bacteria \
-m genome
busco \
-i spades_hyb_ecoli_ass.fasta \
-o spades_hyb_ecoli_ass.out \
-l bacteria \
-m genome
The BUSCO assessment output for each assembly will be saved in a separate directory:
“abyss_ecoli_ass.out”, “spades_ecoli_ass.out”, and “spades_hyb_ecoli_ass.out”. Each of
these directories includes an assessment report as a text file and JSON file, in addition to
subdirectories for the predicted genes and used ortholog database.
Comparing between the three assemblies based on BUSCO assessment metrics (Figures
3.13–3.15), the two assemblies generated by SPAdes are better than the one generated by
ABySS. A total number of 4085 genomes and 124 genes were used to extract informed
expected information. The E. coli assembled by SPAdes shows 100% completeness (C:100%),
no duplicate (D:0.0%), no fragments (F:0.0%), no missing gene (M:0.0%) out of the 124
genes, whereas the BUSCO assessment report for the assembly generated by SPAdes shows
C:98.4% [S:98.4%, D:0.0%], F:1.6%, M:0.0%, n:124, which indicates 98.4% of completeness
(122 genes are recovered), 1.6% of fragment (2 partially recovered genes).
Combining both statistical and evolutionary assessment for the de novo assembly will
provide a good idea about the quality of the de novo assembled genome.